Marek’s Disease Virus Virulence Genes Encode Circular RNAs

ABSTRACT Circular RNAs (circRNAs) are a recently rediscovered class of functional noncoding RNAs that are involved in gene regulation and cancer development. Next-generation sequencing approaches identified circRNA fragments and sequences underlying circularization events in virus-induced cancers. In the present study, we performed viral circRNA expression analysis and full-length sequencing in infections with Marek’s disease virus (MDV), which serves as a model for herpesvirus-induced tumorigenesis. We established inverse PCRs to identify and characterize circRNA expression from the repeat regions of the MDV genome during viral replication, latency, and reactivation. We identified a large variety of viral circRNAs through precise mapping of full-length circular transcripts and detected matching sequences with several viral genes. Hot spots of circRNA expression included the transcriptional unit of the major viral oncogene encoding the Meq protein and the latency-associated transcripts (LATs). Moreover, we performed genome-wide bioinformatic analyses to extract back-splice junctions from lymphoma-derived samples. Using this strategy, we found that circRNAs were abundantly expressed in vivo from the same key virulence genes. Strikingly, the observed back-splice junctions do not follow a unique canonical pattern, compatible with the U2-dependent splicing machinery. Numerous noncanonical junctions were observed in viral circRNA sequences characterized from in vitro and in vivo infections. Given the importance of the genes involved in the transcription of these circRNAs, our study contributes to our understanding and complexity of this deadly pathogen. IMPORTANCE Circular RNAs (circRNAs) were rediscovered in recent years both in physiological and pathological contexts, such as in cancer. Viral circRNAs are encoded by at least two human herpesviruses, the Epstein Barr virus and the Kaposi’s Sarcoma-associated herpesvirus, both associated with the development of lymphoma. Marek’s disease virus (MDV) is a well-established animal model to study virus-induced lymphoma but circRNA expression has not been reported for MDV yet. Our study provided the first evidence of viral circRNAs that were expressed at key steps of the MDV lifecycle using genome-wide analyses of circRNAs. These circRNAs were primarily found in transcriptional units that corresponded to the major MDV virulence factors. In addition, we established a bioinformatics pipeline that offers a new tool to identify circular RNAs in other herpesviruses. This study on the circRNAs provided important insights into major MDV virulence genes and herpesviruses-mediated gene dysregulation.

regions of the genome from which abundant and linear alternative transcripts derived from complex splicing were discovered (39,44,(48)(49)(50). An RT-PCR approach, based on divergent primers that exclusively amplify circRNAs, was applied to total RNA extracts depleted of linear transcripts. This method was carried out on several relevant biological samples associated with the different phases of the viral cycle: (i) two commonly used permissive cell types (chicken embryonic fibroblasts, CEF, and ESCDL-1 cells) infected with the very virulent RB-1B strain as a model for the productive phases of the virus; (ii) a lymphomaderived cell line (MSB1) as a model for transformed T cells infected with MDV; (iii) MSB1 cells treated with sodium butyrate (NaBu) or 5'azacytidine as a model for virus reactivation as demonstrated previously (51); (iv) three samples harvested from RB-1B infected chickens (peripheral blood lymphocytes, spleen, and feather follicle epithelium) ex vivo RB-1B-infected T and B lymphocytes (52).
Inverse PCR products corresponding to circRNAs were observed from the two investigated loci in various conditions of the viral infection (Fig. 1). Several amplicons were obtained for both loci, which is indicative of circRNA isoforms processed from the meq and LATs transcriptional units. While the meq locus showed a constitutive expression of circRNAs regardless of the tested RNA sample, circRNA expression from the LATs locus varied according to the source of the RNA. PCR signals specific for circular transcripts were more obvious from PBL and splenocytes collected from in vivo infected animals and from lymphoma-derived cells reactivated with 5'azacytidine ( Fig. 1). Sanger sequencing of the inverse PCR-amplified meq and LATs products confirms the circularization event with the use of an acceptor splice site localized downstream of the donor splice site (Fig. 1). This discovery of the back-splice sites provides the first solid evidence that MDV encodes circular RNAs in vitro and in vivo.
The robustness of this method, based on inverse PCR amplification to identify circRNAs, was validated on both viral and cellular genes by assessing the enrichment of circular transcripts and degradation of their linear counterparts after RNase R treatment (Fig. 2). We performed PCR assays to monitor circular and linear forms of viral meq and host gga-WDR62 transcripts (53). The signal increase observed in Fig. 2A ensured the circular shape of both transcripts. The signal decrease seen in Fig. 2B showed that RNase R treatment targets and degrades linear transcripts.
Identification of canonical circRNAs expressed in the MDV repeat regions. After identifying the expression of circRNAs from two key virulence genes of the virus, we explored the transcription of circRNAs from the repeat regions of the MDV genome, which is known to express most of the pathogenesis-related genes as multiple spliced transcripts, in a nonexhaustive way. Twenty-eight primer pairs (19 in TR L /IR L and 9 in IR S /TR S ) were designed to target previously identified exons of pp14, meq/vIL-8, ERL, or LATs loci (Table 1). We used two cell lines to assess the different stages of the MDV infection cycle: ESCDL-1 cells infected with the MDV RB-1B strain representing lytic viral replication; MSB1 as a representative cell line of latency and transformation; and MSB1 cells treated with NaBu to mimic virus reactivation.

Viral Circular RNAs Encoded by MDV
Journal of Virology mainly constituted with previously described exons found in linear transcripts (39,44,(48)(49)(50), but we identified new donor or acceptor splice sites that lead to the RNA circularization in the first or the last exons of the linear transcripts like the third exon of vIL-8 or the first exon of the LATs (Fig. 4). The majority of these circRNAs contain multiple exons but a few circRNAs in the explored loci are constituted by only one exon. All the donor and acceptor splice sites are listed with their back-splicing association in Table S1. Altogether, viral circRNAs identified in the tested conditions present a high level of internal alternative splicings, such as intron retention, the use of an alternative donor or acceptor splice site, and exon skipping. Identification of noncanonical circRNAs expressed in the MDV repeat regions. Besides identifying circRNAs with canonical U2 back-splicing junction, the thorough analyses carried out on oriented transcripts revealed numerous circular sequences produced from noncanonical back-splicing events. To define the polarity of the matrix strand, the analyses were limited to transcripts encompassing at least one internal canonical splicing event. By using this selection, we identified 70 noncanonical circRNAs showing a complex alternative splicing pattern: 33 spliced noncanonical circRNAs in the R L (Fig. 5A) and 37 in the R S (Fig.  5B). Back-splicing events leading to the generation of these noncanonical circRNAs are localized in intronic or exonic parts of the splice sites.
For the TR L /IR L , we detected 13 noncanonical circRNAs in the sense strand of the pp14 locus (Fig. 5A). These circRNAs encompass the three pp14 exons and RLORF9 and follow the internal splicing identified for linear transcripts (50). For the miRNAs/meq/vIL-8 locus, 17 noncanonical circRNAs were identified (Fig. 5A). These circRNAs encompassed the miRNAs M9-M4 cluster and meq/vIL-8 transcript and as observed for the linear spliced transcripts, miRNAs are localized in the intronic part (48). Interestingly, the majority of these circRNAs use a donor back-splice site in the third exon of vIL-8. For the ERL locus, we found three noncanonical circRNAs in the 39 end of the ERL transcripts (Fig. 5A). As stated above for canonical ERLderived circRNAs, these noncanonical circRNAs are antisense to pp14. In the IR S /TR S , the whole series of 37 noncanonical circRNAs derived from the 59 part of the LATs (from exons 1 to 8), while only canonical circRNAs covered exons 8 to 12 (Fig. 5B). In addition, among the 518 candidate MDV circRNA clones, we analyzed 289 sequences devoid of internal splicing events and processed by noncanonical back-splicing. These circRNAs possess a high variability in size, ranging from 50 to 1428 nt, and in localization (unpublished data).
Identification of viral circRNAs during in vivo infection. To demonstrate MDV circRNA expression during tumorigenesis in the natural host, we used a complementary approach because our inverse PCR approach was limited to targeted regions. We scanned the fulllength viral genome to detect circRNA expression by taking advantage of high-throughput RNAseq data obtained after MDV in vivo infections. A bioinformatic pipeline was applied to extract viral sequences from a circRNA enriched data set (accession number GSE138600) that was previously exploited to characterize specific host circRNA signatures during Marek's disease virus infection (20). This pipeline can identify and quantify new back-spliced junctions in viral transcripts at a nucleotide-precision level, even the noncanonical ones. This analysis was conducted on nine infected animals among which four showed tumor development in the spleen (tumor-bearing animals, TA 1 ) and five that did not develop tumors (tumor-exempt animals, TA 2 ) (20).
After data processing and analyses, we identified numerous circRNAs expressed during MDV pathogenesis. Back-spliced junction counting and localization over the MDV genome revealed a common circRNA expression pattern for the four TA 1 samples (Fig. 6A to D). In contrast, data analysis from TA 2 samples did not lead to any circRNA mapping over MDV except for one sample (Fig. 6E). This sample presented the same profile as the TA 1 with a 5-fold lower circRNA hit count. We found four major loci with viral circRNA expression in the repeat regions, half of them corresponding to U2 back-splicing products and the other half to noncanonical ones. Strikingly the hit counts of the two U2-processed loci comprise the majority of the viral circRNAs. They were identified by open, unbiased sequence analysis and corresponded to the circRNAs identified by the inverse PCR strategy targeting key virulence genes, the LATs, and meq transcriptional units (Fig. 1). Back-spliced junction sites, spreading over the LATs exons 8 to 12, constituted 48% to 87% of circRNA hits in TA 1 samples while those spreading over the meq exons were 7% to 15% (Fig. 6).
In addition to U2 circRNAs, several noncanonical circRNAs were mapped in two main loci corresponding to pp14 and vTR transcriptional units (Fig. 6). TA 1 sample analyses revealed circRNAs processed from noncanonical back-splicing at the pp14 locus, with a differential rate according to the tested samples. Three of the TA 1 samples presented a low number of pp14 circRNAs while one TA 1 sample harbored a high noncanonical circRNA expression from this gene corresponding to 39% of all viral circRNAs (Fig. 6C). In contrast, we found noncanonical back-splicing sites at the vTR locus at low levels but consistently in the different spleen samples (Fig. 6).
The bioinformatic pipeline was also applied to another set of high-throughput RNAseq data obtained from an independent experiment investigating MDV in vitro productive infection and focusing on host circRNAs dysregulation (47). This RNAseq data set (accession number GSE166240) from infected chicken embryo fibroblasts (CEF) was processed to map back-spliced junctions over the MDV genome. This analysis showed a wider distribution of back-splicing sites and a higher proportion of noncanonical circRNAs during lytic MDV replication (Fig. 7). Canonical circRNAs were only identified from the LATs exon 8 to 12 locus. The highest density and variety of circRNAs are derived from the pp14 locus.

DISCUSSION
We designed this study to investigate whether an oncogenic avian herpesvirus expresses circRNAs at key steps of viral infection. Our strategy first focused on virulence genes with previously identified complex splicing patterns. The approach was then broadened to identify viral circRNA expression in vivo from any MDV transcriptional unit by using a newly developed bioinformatic pipeline that we set up to identify circRNA junction sites from existing RNA sequencing data.
The first series of experiments identified four hot spots of circRNA expression in virulence genes located in the repeat regions of the MDV genome. For some of the target transcripts, viral circRNA expression varied according to the investigated stage of the viral infection, lytic replication, latency, or reactivation. In-depth sequence analyses of each back-splicing site revealed the production of canonical and noncanonical viral circRNAs processed by the U2dependent or U2-independent splicing machinery, respectively. Therefore, this report provides new information on the production of noncanonical circRNAs during viral infection (6,25,26). The next set of experiments was based on open and unbiased analyses of lymphoma-derived sequences over the whole MDV genome and pointed out the meq and LATs loci as the central hubs of viral circRNA expression during tumorigenesis. This complementary and independent approach confirmed the production of noncanonical circRNAs through the description of back-spliced junctions, at the nucleotide precision level. Altogether, our data provided the first characterization of circRNA expression in Marek's disease virus infections. MDV circRNA expression is infection stage-dependent and circRNAs are processed from virulence-and latency-associated transcripts.
Given the first roles assigned to some of these viral circRNAs in human herpesviruses, follow-up functional studies should investigate the roles of circRNAs identified in the four loci of the MDV genome. Because the contribution of viral circRNAs in tumorigenesis cannot be Without reconsidering the roles of the Meq protein as a functional transcriptional factor controlling a large set of viral and cellular genes, and without questioning data showing the impact of meq sequence polymorphisms associated with various levels of MDV virulence (62,63), the phenotype of meq deletion mutants (38,55,(64)(65)(66) might be challenged by the present study. Mechanistic studies on the circRNAs produced from the meq transcriptional unit might reveal additional functions or another kind of regulation linked to the major MDV oncogene.
The pp14 locus is poorly characterized until now but has been associated with MDV neurovirulence (50, 56, 57). Interestingly, two features of circular pp14 transcripts can be is found in the close vicinity of the pp14 transcriptional initiation. Highly expressed circRNAs derived from lytic genes are also encoded from regions surrounding the herpes virus lytic origin of replication in four independent models: EBV (21), MHV68 (22), rLCV (22), KSHV (22). As suggested by Ungerleider et al. (21,22), this colocalization of OriLyt and circRNA locus raises two nonexclusive assumptions. On the one hand, the initiation of DNA replication may trigger back-splicing events and on the other hand, circular transcripts may act through base pairing at the replication origin to initiate DNA polymerization. The second pp14 feature is that several internal ribosome entry sites (IRES) were previously identified and shown functional, leading to the translation of pp14-b and RLORF9. As described for oncogenic peptides translated from circRNAs in the infection with high-risk human papillomavirus (23), this opens the possibility of pp14-derived proteins translated from MDV circRNAs using these cap-independent initiation sites.
In alphaherpesviruses, LATs belong to a family of RNAs that are spliced from a lncRNA antisense to the mRNA, encoding immediate-early transactivator proteins (39,45,46). In MDV, LATs were more thoroughly characterized at their 59 end, with the identification of several miRNA sequences processed from the first intron (39, 67) and a polymorphism affecting both the LATs promoter and the first exons according to the virulence level of the MDV strain (46,68,69). Functional characterization of this region focused on viral miRNAs involved in Marek's disease pathogenesis (70)(71)(72). One of these has been more deeply characterized because it was shown to target two viral genes involved in MDV productive infection and reactivation (39). These functional studies left the 39 end of the transcript unexplored. In this context, our data pointed out that the 39 part of the LATs, spanning from exons 8 to 12, is the most productive source of viral circRNAs in vivo, paving the way to functional characterization of this part of the LATs.
Regarding ERL, the last genomic locus where circRNAs were detected, a single study presented the hyperediting of a long and highly spliced ncRNA by ADAR-1 (44). Edited circRNA identification from this peculiar transcript fosters better knowledge of the role of these products antisense to pp14, of viral miRNAs and meq.
Research on circRNA recently enlightened a new back-splicing pattern (3)(4)(5)(6)(7). The footprint left by the splicing machinery did not indicate any known canonical nucleotide sequence, being independent of both U2-and U12-driven splicing processing. In the context of our study, two independent approaches (Fig. 5 to 7) led to the description of noncanonically spliced transcripts. Therefore, we suggest that noncanonical circRNA production might be a hallmark of MDV infection. In recent studies reporting on viral circRNAs, noncanonical backsplice junctions were also observed in KSHV and EBV (21,73). The precise mechanism behind this alternative splicing process is unknown yet. Because reverse genetics systems offer a straightforward approach for different viruses, the biogenesis of these noncanonical circular spliced transcripts might be tackled in these viral models. ICP27, a conserved protein in herpesviruses, was previously associated with splicing disruption in the context of viral infections (74)(75)(76). Making use of the available tools in MDV, ICP27 implication in noncanonical circRNA biogenesis should be addressed by using recombinant viruses deleted for ICP27 (77) or by using overexpression vectors (39,74). Recent knowledge about splicing may also shed new light on this regulation. Notably, emerging RNA modification studies have described a link between back-splicing and adenosine methylation (reviewed in reference (78)). Further studies should address the involvement of both the methylation enzyme METTL3 as well as ICP27 in circRNA biogenesis in Marek's disease context.
In conclusion, our study is the first report of viral circRNA expression in MDV infection and MDV-induced lymphoma in vitro and tumor samples collected from infected animals. Because viral circRNA expression correlates with transcriptional units of key virulence factors, the associated genes deserve a deeper characterization. The high diversity of back-spliced junctions discovered in our data set opens several perspectives regarding circRNA biogenesis.
Ex vivo infections of B and T lymphocytes were carried out using the RB-1B strain and following the guidelines described by Schermuly and collaborators (52) with small adaptations of the infection procedure (63,83). In vivo samples used in this study (spleen, feather follicles (FFE), and peripheral blood lymphocytes (PBL)) were obtained from previous experiments (51). They were conducted following Belgian and European laws, notably Directive 2010/63/EU. The ethics committees of Sciensano approved the two sets of experiments that we used in the present study (file numbers being LA1230174 and 20191016-03).
RNA extraction and circRNA enrichment. Total RNA was extracted from MDV-infected cells by using TRI Reagent (Invitrogen). Residual DNA was removed with DNase I (NEB) according to the manufacturer's instructions. After that, to remove linear RNAs, an RNase R (Lucigen) treatment was performed. 4 mg of total RNA was treated with 20 U of RNase R for 20 min at 37°C. Before and after the RNase R treatment, the samples were extracted with phenol-chloroform (Sigma).
RNase R-treated RNA was then reverse-transcribed by using random primers (50 mM) and Superscript III (Invitrogen) reverse transcriptase following the manufacturer's protocol. The efficiency of the RNase R treatment was evaluated by amplifying one host and one viral circular RNA and their linear counterparts from treated or untreated RNA (Fig. 2).
The resulting cDNAs were amplified by PCR (35 cycles  (Promega) and 0.2 mM each primer in the reaction buffer provided by the manufacturer. All the primers used in this study are reported in Table 1. For all the positive PCRs, amplicons were purified by using the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel) and inserted into the pGEM-T easy vector (Promega). Around 20 clones were sequenced by Sanger sequencing (Eurofins Genomics) and the corresponding sequences were aligned against the MDV RB-1B genome sequence (Geneious software; Biomatters). To limit the false-positive detection of circRNA production caused by PCR artifacts, clones that did not respect the U2 canonical splice sites (GT/AG) were considered valid only if no more than 5 similar nucleotides were present at both sites of the back-splice junction.
Bioinformatic analyses. Two data sets from previous experiments (20,47) were downloaded from the SRA database (GEO accession numbers: GSE138600 and GSE166240). The first set of data ((20); data available under GSE138600) was obtained from infected chickens with (TA 1 ) and without (TA 2 ) MDV-induced T CD4 1 lymphomas and was analyzed through our pipeline. The second set of data ((47); data available under GSE166240) were gathered from chicken embryonic fibroblasts (CEF) infected by the Md5 strain of MDV. All the data obtained from this experiment were analyzed through our pipeline. Note that in the present study, the sequence libraries (GSM5066726, Md5-1 input; GSM5066727, Md5-2 input; GSM5066728, Md5-3 input) correspond to the total RNA extract following circular transcripts enrichment.
The reads were trimmed using Trimmomatic (84) following the publisher's protocol. The process included the trimming of the adapters, quality trimming (Sliding Window 4:15), and a minimal length filter of 70 nucleotides. The duplicated reads from PCR artifacts were deleted using dedupe.sh from the BBMap package (85). The reads were aligned on the MDV-Md5 genome (NC_002229) for the first time using Burrows-Wheeler Aligner (86) with options set on "mem -a -T15" and the unmapped reads were deleted using Samtools (Genome Research Ltd., 2021) according to the publisher's instructions.
The reads were filtered with a succession of in-house Python scripts. First, all the spliced reads were filtered by accepting only the reads presenting at least one primary and one supplementary alignment.
Next, the reads presenting a back-splicing signal were extracted. This filter consisted of the analyses of all the different mappings of each read individually. They were sorted following their location. If a sequence proximal in the read was aligned at a distal position in the genomic sequence (and vice versa), it was considered a potential circularization signal, and both the back-spliced junction and donor and acceptor splice sites were stored in a new file with the read IDs. To increase the specificity of our circRNA detection, several filters were put in place deleting (i) the reads that map repetitive sequences; (ii) the reads that present back-splice sites distant by more than 5000 bp on the genome; (iii) the reads with no crossmatch between their back-splice junction and the two corresponding splice sites; (iv) the reads reported with incomplete back-splice junctions or splice sites.
The back-splice junctions were counted according to their similarity. Ninety-five percent of similarity was considered a match allowing thereby one mismatch in the back-splice junction. The same 95% of similarity was also required between the reported back-splice sites. A filter was designed to delete the reads presenting long stretches (more than 5 nucleotides) of similarities at donor and acceptor backsplice sites. The idea was to avoid false positives due to PCR artifacts.
Using the ID list extracted from this count, the complete reads were extracted from the first SAM file to exhibit a proper alignment, processed with Samtools, and drawn using Geneious.
Data availability. All data and bioinformatic codes are available on reasonable request to Damien Coupeau (damien.coupeau@unamur.be).

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, XLSX file, 0.02 MB.